home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Libris Britannia 4
/
science library(b).zip
/
science library(b)
/
MATHEMAT
/
STATISTI
/
2845.ZIP
/
TS_PD.DOC
< prev
next >
Wrap
Text File
|
1991-08-22
|
49KB
|
1,107 lines
TURBOSTATS Survey Analysis System
=================================
M. C. Hart
698 Uppingham Road
Thurnby
Leicestershire
LE7 9RN
Contents :
========
General Introduction ... Page 1
How does TURBOSTATS work ? ... Page 2
Brief description of the TURBOSTATS modules ... Page 4
Running TURBOSTATS ... Page 6
Description of the individual TURBOSTATS
modules :
TS-FREQ1 ... Page 7
TS-CROSS ... Page 9
TS-STATS ... Page 13
TS-ENTRY ... Page 17
TS-CASES ... Page 17
TURBOSTATS utilities
SD (Sorted Directory) ... Page 18
SNAPSHOT ... Page 18
Interfacing with Graphics ... Page 18
TURBOSTATS capacities ... Page 19
DO's and DON'Ts ! ... Page 20
Version 2.01
Issued : October, 1989
Public domain version : September, 1991
Page 1
GENERAL INTRODUCTION
====================
TURBOSTATS is the name given to a suite of programs designed
to work with each other in the analysis of survey data. Each
program may be run as a 'stand-alone' program or as part of
an integrated system. The TURBOSTATS system is closely
modelled upon the SPSS (Statistical Package for the Social
Sciences) statistical package and is designed to give output
similar to that offered by the SPSS 'Frequencies' and
'Crosstabulations' commands.
The analysis of survey material tends to fall into the
following categories :
(i) the counts and percentages of the various
values taken by a single variable ( e.g.
those replying 'Yes, 'No' or 'Do not Know'
in response to a survey question). From
this we can form a FREQUENCY DISTRIBUTION.
(ii) the formation of tables typically involving
two variables known as CONTINGENCY or
CROSS-TABULATION tables. For example, we
could have table with a variable SEX
subdivided into 'Male and 'Female' on one
axis whilst the other axis might be a
variable INCOME subdivided into 'High' and
'Low'. The CONTINGENCY TABLE would display
the numbers of cases that fall into each of
the resulting 'cells' as well as computing
other relevant statistics.
(iii) hypothesis tests designed to measure whether
the mean of one variable or sub-group in the
data differs significantly from that of
another variable or sub- group in the data.
Another form of hypothesis test might be to
discover whether, in a contingency table,
the type of newspaper read differs by sex,
for example.
Page 2
HOW DOES TURBOSTATS WORK ?
========================
In order to function, the TURBOSTATS modules require two
files of data :
(i) a data file consisting of numbers, separated from
each other by spaces, commas or semi-colons. Such
a file is often known as a CSV (Comma Separated
Value) file e.g.
1,2,3
1,4,5
2,6,2
..
etc.
(ii) a labels file which will supply names for the
individual variables ( e.g. SEX,PAPER) and labels
for the individual values that each variable might
take. For example SEX would typically have labels
of 'Male' and 'Female' whilst PAPER might have
'Quality', 'Tabloid','Sunday' etc.
These files can be created in several ways. For fairly
small surveys ( e.g. 100 cases or less) you could use the
TS-ENTRY module. For larger surveys, it might be more cost-
effective in terms of time to input data using dBASE III and
to create data files with the dBASE III command :
COPY TO filename.ext DELIMITED
It is also possible to create the data and labels files by
using your favourite word-processor or text-editor ( e.g.
WordStar in non-document mode) In the latter case,though,
you would not have the benefit of any error-checking or
correction facilities. A labels file might look like the
following :
"SEX","Sex of Individual"
"SEX","Male"
"SEX","Female"
"CLASS","Social Class"
"CLASS","Professional"
"CLASS","Intermediate"
"CLASS","Skilled Manual"
"CLASS","Semi-skilled Manual"
"CLASS","Unskilled Manual"
"CLASS","Pensioners"
"CLASS","Not classified"
"PAPER","Newspaper read"
"PAPER","None"
"PAPER","Quality"
"PAPER","Middle-brow"
"PAPER","Tabloid"
..
etc.
Page 3
The TURBOSTATS system will assume that the first variable
name encountered in a labels file will relate to the first
column of data found in a data file. Similarly, the second
variable found will relate to the second column of data and
so on. Care should be taken to ensure that the variable
names match up with the various columns of numbers as
TURBOSTATS has no way of 'knowing', other than by position
in a list, which variable name matches up with which column
of data.
The labels work in a similar fashion. Once the TURBOSTATS
system has identified the 'starting point' in the labels
file, then it is assumed that:
- the first entry will be a label which expands upon
the name of the variable (known as a VARIABLE
LABEL) For example, the variable name FINCOME might
be a variable which you might wish to label as
'Fathers income'.
- each subsequent label relates to the various values
taken by the variable and consequently is known as
a VALUE LABEL. The labels should cover the range
from the minimum to the maximum values of that
variable likely to be encountered in the data set.
In this respect, TURBOSTATS does not differ materially from
the SPSS philosophy. Care should be exercised to ensure that
variable names match up with the appropriate columns.
Missing Values
~~~~~~~~~~~~~~
A problem with all survey material is what to do with those
cases where, for a variety of reasons, the question has not
been completed. For example, a question on 'Father's
Income' cannot be answered if the respondents father is dead
or if the income is unknown. In such cases, the survey
analyst assigns a 'MISSING VALUE' number to such cases e.g.
the number 0,9 or -1 as long as it is integer (i.e. whole
number) In subsequent analyses, TURBOSTATS will request
MISSING VALUE code numbers and use these to exclude data
from further analysis (although typically reporting the
number of cases that fall into the MISSING VALUES category).
Page 4
BRIEF DESCRIPTION OF THE TURBOSTATS MODULES
===========================================
The TURBOSTATS system provides three modules which are
designed to analyse survey data (TS-FREQ1,TS-CROSS and TS-
STATS) and a further two to aid the entry and editing of
data files (TS-ENTRY,TS-CASES). In addition, utilities are
provided to provide sorted directories and to capture screen
outputs into files for subsequent processing in reports.
Provision is also made for the access of your favourite
spreadsheet package if you wish to process your data in a
graphical form.
Each of these will now be described briefly :
TS-FREQ1 provides for the frequency distribution of
the values in a single variable measured at
the nominal level. This is the module best
used to analyse the patterns of response to a
single question. The output consists of
counts, percentages and a simple bar-chart.
It is also possible to save results in a file
should you wish to import these later into a
graphics package for further analysis.
TS-CROSS provides for contingency tables of two
variables measured at the nominal level. This
is the module best used to examine the
operation of two variables together ( e.g.
sex and newspaper readership) At its
simplest, TS-CROSS provides simple counts for
the number of cases that will fall into each
'cell' but it can also generate the column
percentages, row percentages, total
percentages, expected values and chi-square
values for each cell in the table.
TS-STATS is the module which can provide for the more
specialised statistical information required
on either of one or two variables. If two
variables are specified then a range of
bi-variate statistics are also calculated
including the correlation coefficient, the
regression equation and the 't-test' for the
differences in means. It is also possible to
use this module to perform 't-tests' i.e.
tests of statistical significance on
sub-groupings within a variable upon request.
For example, it would be possible to discover
whether the mean income for 'Females' might
differ from the mean income for 'Males' in a
data set. It is also possible to display
histograms of variables and a scatterplot of
the joint distribution of two variables.
Page 5
TS-ENTRY is the module that is used to create the
files for :
(i) variable and label names
(ii) the input of (numerical) data.
A labels file needs to be created first in
order that the variable names can supply
prompts for the various values before the
input of numerical data.
To simplify the operation of TS-ENTRY, the
module is not designed to alter or modify
existing label files. If the modifications
are minor, this is best achieved using your
usual word-processor/text editor - in the
event of major modifications, you would be
well advised to create a brand-new labels
file in any case.
TS-CASES is a module which creates sub-files of your
data for more detailed analysis. For
example, you could create a file containing
only 'Males' so that you can then examine
relationships further within the data that
relate only to 'Males'
A utility is provided that enables you to view a sorted
directory, operated from the principal menu, should you
forget a filename. This utility also gives you the file
size and an indication of the space free upon your disk.
Provision is also made for you to load the spreadsheet of
your choice (e.g. the LOTUS 1-2-3 clone ASEASYAS) in order
to access the advanced graphics capacities of such a
package.
Page 6
RUNNING TURBOSTATS
==================
To run the TURBOSTATS system is really quite simple.
(1) If you are installing the system for the first
time on a hard disk, then copy all of the files on
the disk over to a subdirectory of your choice.
Then run the TS-INSTL program.
(2) If you are running the program from either a
floppy or a hard disk, then you may run the whole
integrated system with the command
TS [Dr A:] (where Dr A: represents the
drive upon which you would
like the 'screensnap'
files to be stored)
or you may run any of the programs by name directly
i.e. TS-MENU (Menu and loader program)
TS-FREQ1 (Frequencies)
TS-CROSS (Cross-Tabulations)
TS-STATS (Statistics of one or two
variables)
TS-ENTRY (Label and data entry)
TS-CASES (Creates sub-files of data)
(3) If you run the integrated system TS then a batch
file is loaded which will make the screen capture
program (SNAP.EXE) memory resident and remind you
how it is to be activated. Make a note of the
command that is necessary to 'snap' your screen
pictures : i.e. PRTSC
Subsequently, when the program terminates, the
batch file will run another program
(DEVELOP.EXE) which will 'develop' your screen
snaps into files named SNAPSHOT.01..SNAPSHOT.30.
Make sure that you have sufficient space on disk
to hold your snapshots : each will take a maximum
of 2000 bytes. If you have 'old' snapshots on
disk ( i.e. SNAPSHOT.01 .. SNAPSHOT.30) then
rename these to another name (e.g. OLDSNAP.01 ..
OLDSNAP.30) before you start a new session as
otherwise the SNAP.EXE program will overwrite the
old 'SNAPSHOTS' found on your disk.
Page 7
DESCRIPTION OF THE INDIVIDUAL TURBOSTATS MODULES
================================================
TS-FREQ1
========
Sample input screens :
~~~~~~~~~~~~~~~~~~~~
TS-FREQ1 TURBOSTATS (c) M.C. Hart [1989]
~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
Performs frequency counts,barcharts of raw (nominal) data..
Name of raw data file ? mysurvey.txt
Name of labels file ? labels.txt
-------------------------------------------------------------------
TS-FREQ1 TURBOSTATS File: MYSURVEY.TXT
~~~~~~~~ ~~~~~~~~~~
Performs frequency counts,barcharts of raw (nominal) data..
Variable List - [Y]es or [N]o .. [X] to exit
ID SEX CLASS PAPER
Variable ? sex
Missing Values should be integers in the range -32768..32767
e.g. [0] [9] [-1] [ 0 by default ]
Missing Values 9
-------------------------------------------------------------------
Sample output screen :
~~~~~~~~~~~~~~~~~~~~
SEX Sex of Individual File: MYSURVEY.TXT
Valid Cum
Value Label Value Frequency Percent Percent Percent
Male 1 136 50.4 51.5 51.5
Female 2 128 47.4 48.5 100.0
9 6 2.2 MISSING
------- ------- -------
TOTAL 270 100.0 100.0
Male ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 136
Female ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 128
Valid Cases 264 Missing Cases 6
Page 8
The important point to remember about TS-FREQ1 is that is
designed to deal only with categorical (nominal) data. This
is data in which numbers 'stand for' categories in the data
rather than being regarded as entities in their own right.
We would not wish to perform statistical operations upon
such numbers for they are essentially 'labels' or 'flags'
that indicate different categories of the variable under
consideration. If, in a random survey, we named a variable
SEX and coded 'Female' as 1 and 'Male' as 2 then we could
count up the numbers of '1s' (i.e. Females) and '2s' (i.e.
Males) and also perform such calculations as the percentage
each contributes to the total. But if we had 50 cases of
'Male' and 50 cases of 'Female', it would not make sense to
average the numbers ( to produce a mean of 1.5) because the
numbers are essentially meaningless.
The frequencies values should be in the range of 1-20 and
work best if the ranges are 0-8 (with 9 for missing values)
or 1-9 ( with 0 used for missing values)
Be careful to specify the exact drive and filename of your
data and label files. If you ignore the entension, then
TURBOSTATS will assume that you intend a file with the .TXT
extension and will add this extension automatically to your
filename.
A barchart is generated automatically but on the next 'page'
or 'screen' if space is limited. Press the ENTER key to get
the next page of output. This instruction is NOT shown on
screen in order to keep the screen free of instructions
should you wish to capture output for a subsequent report.
You also have the chance to save your output in an output
file and should follow the system prompts carefully, making
sure that your filename is a legitimate MS-DOS filename i.e.
1-8 characters with no embedded spaces and with an extension
e.g. a:myfile.txt
Page 9
TS-CROSS
========
Sample input screens :
~~~~~~~~~~~~~~~~~~~~
TS-CROSS TURBOSTATS (c) M.C. Hart [1989]
~~~~~~~~ ~~~~~~~~~~
Constructs contingency tables from raw (nominal) data..
Variable List - [Y]es or [N]o .. [X] to exit
ID SEX CLASS PAPER
First variable ? sex
Second variable ? class
Missing Values should be integers in the range -32768..32767
e.g. [0] [9] [-1] [ 0 by default ]
Missing Values 9
-------------------------------------------------------------------
The data is now entered..
In the contingency table, you have a choice of options
as well as the cell counts
These are [1] Row percentages
[2] Column percentages
[3] Total percentages
[4] Expected values
[5] Chi-square statistic
If you want to choose the option, then give the OPTION number
when prompted. Options will be printed in the order you specify..
Specify 0 if you do NOT want the option ..
First Choice [Option No] 1
Second Choice [Option No] 2
Third Choice [Option No] 4
Fourth Choice [Option No] 5
Fifth Choice [Option No] 0
--------------------------------------------------------------------
Page 10
TS-CROSS
========
Sample output screen :
~~~~~~~~~~~~~~~~~~~~
Crosstabulation of SEX Sex of Individual File: MYSURVEY.TXT
By CLASS Social Class
CLASS >│Profes Interm Skille Semi-s Unskil Pensio Not cl│ ROW
│sional ediate d Manu killed led Ma ners assifi│TOTAL
SEX │ 1 2 3 4 5 6 7 │
│──────┼──────┼──────┼──────┼──────┼──────┼──────┼
Male 1│ 24 │ 17 │ 15 │ 27 │ 33 │ 4 │ 26 │ 146
[Row %] │ 16.4 │ 11.6 │ 10.3 │ 18.5 │ 22.6 │ 2.7 │ 17.8 │51.4%
[Col %] │ 57.1 │ 77.3 │ 40.5 │ 50.9 │ 50.0 │ 15.4 │ 68.4 │
[Exp ] │ 21.6 │ 11.3 │ 19.0 │ 27.2 │ 33.9 │ 13.4 │ 19.5 │
[Chis ] │ 0.3 │ 2.9 │ 0.9 │ 0.0 │ 0.0 │ 6.6 │ 2.1 │
│──────┼──────┼──────┼──────┼──────┼──────┼──────┼
Female 2│ 18 │ 5 │ 22 │ 26 │ 33 │ 22 │ 12 │ 138
[Row %] │ 13.0 │ 3.6 │ 15.9 │ 18.8 │ 23.9 │ 15.9 │ 8.7 │48.6%
[Col %] │ 42.9 │ 22.7 │ 59.5 │ 49.1 │ 50.0 │ 84.6 │ 31.6 │
[Exp ] │ 20.4 │ 10.7 │ 18.0 │ 25.8 │ 32.1 │ 12.6 │ 18.5 │
[Chis ] │ 0.3 │ 3.0 │ 0.9 │ 0.0 │ 0.0 │ 6.9 │ 2.3 │
│──────┼──────┼──────┼──────┼──────┼──────┼──────┼
TOTAL 42 22 37 53 66 26 38 284
14.8% 7.7% 13.0% 18.7% 23.2% 9.2% 13.4% 100.0%
Valid cases = 284 Missing = 16
Total chi-square D.F. Significance Cells with E.F. < 5
26.161 6 0.0002 0 of 14 ( 0.0% )
Page 11
Contingency tables also require two variables measured at
the nominal (categorical) level. The output is designed so
that a maximum of NINE columns may be displayed horizontally
on the screen. If your data contains more than nine
categories, it may be unnecessarily complex in any case and
consideration should be given to collapsing the categories
so that there is a maximum of nine.
Several options are given as as well as the cell counts
which are always supplied. These are :
- Column % (Proportion the cell contributes to the
column total)
- Row % (Proportion the cell contributes to the
row total)
- Total % (Proportion the cell contributes to the
overall total)
- Expected The value expected in each cell if the
proportion of the row totals are
applied to the relevant column totals
(i.e. there is no relationship between
the two variables)
- Chi-square A value calculated from the formula :
(Observed - Expected)²
--------------------
Expected
which is then totalled to produce a
total chi-square ( often designated as
X²) The 'p' value is the probability
of chi- square occuring by chance and
will take a value between 0 and 1. An
output of p=0.05 means that there
there is only a 5% chance (1 in 20)
that the association found in the data
could have occurred by chance alone.
The 5% level is the conventional
'significance level' used to test a
statistical hypothesis. A value of
p=0.0000 means a probability of 5 in
100,000 or less i.e. practically zero.
Remember that a LOW 'p' value
indicates that it is likely that the
variables are significantly related
and vice versa.
Page 12
Special case of a 'single value' column or row
----------------------------------------------
Under these circumstances, a normal contingency table is not
possible. However, TS-CROSS will sense this special case
and produce a 'GOODNESS OF FIT' test. For example, if we
had the following data :
PAPERS
Quality Tabloid The Rest TOTAL
SEX=1 (Male) 40 30 30 100
(Expected) 33.3 33.3 33.3
Notice that TS-CROSS has taken the 100 cases and calculated
the expected probabilities by assuming that they will be
evenly distributed ( i.e. a third or 33.3% in each cell)
before calculating the appropriate chi-square.
Page 13
TS-STATS
========
Sample output screens :
~~~~~~~~~~~~~~~~~~~~~
File: MYSURVEY.TXT SEX CLASS
Measures of Central Tendency
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mean 1.478 4.108
Median 1.000 4.000
Mode 1.000 5.000
Measures of Dispersion
~~~~~~~~~~~~~~~~~~~~~~
Minimum 1.000 1.000
Maximum 2.000 7.000
Range 1.000 6.000
First Quartile 1.000 3.000
Third Quartile 2.000 6.000
Semi-Interquartile Range 1.000 3.000
Variance 0.250 3.567
Stan.dev [pop-n] 0.500 1.889
Stan.dev [sample] 0.500 1.892
S.E.Mean 0.029 0.112
Measures of Distribution Shape
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Skewness 0.088 -0.183
Kurtosis -1.999 -0.965
-----------------------------------------------------------------
File: MYSURVEY.TXT SEX CLASS
Numbers of Cases
~~~~~~~~~~~~~~~~
N 293 287
Missing Values 7 13
N (valid pairs) 284
Summary Statistics
~~~~~~~~~~~~~~~~~~
Σx, Σy 433 1179
Σx²,Σy² 713 5867
Σx, Σy (adjusted : pair-wise deletion) 422 1161
Σx²,Σy² (adjusted : pair-wise deletion) 698 5759
Σxy 1740
Bi-variate Statistics
~~~~~~~~~~~~~~~~~~~~~
Correlation r = 0.0554 t = 0.932 p = 0.352
Regression y (CLASS ) = 3.777 + 0.209 * x (SEX )
T-Test (difference in means) t = 22.785 D.F. = 325.04 p = 0.000
Page 14
'T'-test : Sample input and output screens:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Perform a t-test on the variables [Y]es [N]o
It is necessary to divide the variable SEX
into two groups to perform the t-test
Minimum of Group 1 1
Maximum of Group 1 1
Name you wish to give to Group 1 [8 characters or less] Males
Minimum of Group 2 2
Maximum of Group 2 2
Name you wish to give to Group 2 [8 characters or less] Females
-------------------------------------------------------------------
Twosample test of SEX by CLASS File: MYSURVEY.TXT
SEX N MEAN STDEV SE MEAN
Group 1 Males 153 4.216 2.221 0.180
Group 2 Females 140 4.264 1.845 0.156
T-Test (difference in means) t = 0.204 D.F. = 288.37 p = 0.8382
Page 15
Histogram of CLASS
Minimum of CLASS is 1.0 Histogram minimum ? 1
Maximum of CLASS is 7.0 Histogram maximum ? 7
No of classes in the histogram [2-20] ? 7
Histogram of CLASS Social Class File: MYSURVEY.TXT
CLASSES COUNT PERCENT
7.0 38 13.2% ************
6.0 29 10.1% *********
5.0 66 23.0% **********************
4.0 53 18.5% *****************
3.0 37 12.9% ************
2.0 22 7.7% *******
1.0 42 14.6% **************
--------------
Total 287 100.0%
Missing Cases 13
----------------------------------------------------------------------------
Plot of CLASS against PAPER r= 0.0378 File: MYSURVEY.TXT
┌─────────────────────────────────────────────────────────────────┐
8.0 │ * * * * * * * │
│ │
│ * * * * * * │
│ │
│ * * * * * │
│ │
│ * * * * * * * │
│ │
PAPER │ * * * * │
│ │
│ * * * * * * │
│ │
│ * * * * │
│ │
│ * * * * │
│ │
│ * * * * * * │
0.0 │ │
└─────────────────────────────────────────────────────────────────┘
1.0 CLASS Social Class 7.0
Page 16
TS-STATS will produce the range of 'univariate' statistics
on either one or two variables. If two variables are
specified, then in addition to the univariate statistics,
the following bivariate statistics are also produced :
- correlation coefficient (r) which measures the
strength of the relationship between the two
variables. The correlation coefficient
(technically known as Pearson's r) may take a
value that lies beween 0 and 1. Note that
correlation cannot be taken to imply causation. A
t-test and probability for the correlation
coefficient are also calculated.
- regression equation in which the equation of a
'line of best fit' is calculated for the data. The
regression equation allows one to predict the
values for the dependent variable ( = y ) if given
the value of the independent variable ( = x ) For
further details of correlation and regression,
consult a standard statistical textbook.
- a t-test to test whether or not there is a
statistical difference between the means.
If required, a 't-test' may be performed which allows one to
take the categories of one variable ( e.g. 1='Male' and
2='Female' in a variable named SEX) and calculate whether or
not there is a statistical difference between the two groups
with respect to the other variable chosen.
You will be prompted for maximum and minimum values to
facilitate dividing one variable into two sub-groups. If
you have several categories that are not contiguous, then
you will probably have to reorder the data in your original
data file ( as well as amending the corresponding label
files)
Facilities are also available to view histograms and
scatterplots. In the case of histograms, the minimum and
maximum of each variable will be shown and you are free to
accept these or to substitute others of your own. Then you
will be asked to suggest the number of classes (i.e.
divisions) in the data. You will be well advised to choose
categories that are consistent with the data e.g. if the
minimum and maximun are 1 and 7 respectively then choose 7
classes, rather than 10.
A simple scatterplot is also available on request. Note
that the correlation coefficient between the two variables
is displayed but that TURBOSTATS does not distinguish
between multiple plots at the same screen location.
Page 17
TS-ENTRY
========
This module is used to create variable names, variable
labels and value labels as well as entering the raw data
itself. These terms are also used in SPSS but are defined
and illustrated below :
VARIABLE NAME A name of 1-8 characters from the set
[A..Z,0..9,_,-]
VARIABLE LABELS A brief label ( up to 25 characters )
which may be used to amplify the
meaning of the necessarily brief
variable name itself. e.g. INCOME
could have the label of "Anticipated
Annual Salary
VALUE LABELS A brief description of each value that
a variable may take ( up to 15
characters only) Brief variable names
may be preferable to long variable
names as under certain circumstances
the variable label is truncated (i.e.
cut down) to some eight characters.
This is most likely in happen in
TS-STATS when there are nine columns
horizontally across the screen.
The operation of TS-ENTRY is self-explanatory and you
generally have an opportunity to correct errors in both the
label entry and the data entry sections. If you wish to
amend the label files that you have already created, this is
best done with your usual word-processor/text editor.
TS-CASES
========
This module is used to create sub-files of data from your
original data set. For example, you could choose to have a
file which contains only 'Males' or alternatively a file
which excludes 'Males'.
The module is self-explanatory in operation. Generally, you
will wish to 'include' the values of the variable that you
have chosen in your new sub-file. However, it is possible
that you wish to create a file which contains all of the
values of the variable EXCEPT the ones that you have
indicated and in this case you would choose to EXCLUDE those
values from your new sub-file.
Do remember to choose a different name for your new
sub-file!
Page 18
TURBOSTATS UTILITIES
====================
SD (Sorted Directory)
~~~~~~~~~~~~~~~~~~~~
SD is a simple utility which is available from the principal
menu and gives a sorted directory. The size of each file is
specified in bytes and there is also an indication of the
amount of free space available on the disk.
SNAP.EXE Capturing screen output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An especially written utility (SNAP.EXE) is provided and
this is made memory-resident to enable 'snaps' to be taken
of the screen. To 'snap' a picture then press PRTSC. (The
'normal' function of this key i.e. to provide screen dumps
on the printer will be restored later by the DEVELOP.EXE
program) This will record a picture of the screen in memory
and later the DEVELOP.EXE will 'develop' these pictures into
files named SNAPSHOT.01..SNAPSHOT.30. These files may be
printed out or read into other documents if it is wished to
incorporate them into other reports. You should also ensure
that you have a disk (usually in Drive A:) with sufficient
space for each screen snap which will take a maximum of 2000
bytes each.
Interfacing with Graphics
~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are some limited plotting capabilities provided by
TURBOSTATS but it is possible to complement these with the
graphics facilities available in public domain/shareware
programs such as the LOTUS 1-2-3 'clone' 'AS-EASY-AS'.
Provision is made on the main menu for you to load the
package of your own choice. The assumption here is that the
relevant parts of the package are available on your default
drive.
Page 19
TURBOSTATS CAPACITIES
=====================
Number of cases
~~~~~~~~~~~~~~~
TS-FREQ1 and TS-CROSS 7500 cases
TS-STATS 2000 cases
Number of variables
~~~~~~~~~~~~~~~~~~~
For technical reasons, an input line from your data file may
only be 254 characters in length. Remembering that a
position is occupied by each delimiter ( e.g. a space or a
comma), then TURBOSTATS can accomodate
127 variables of length 1 (e.g. 1,2,3)
84 variables of length 2 (e.g. 10,12,14)
62 variables of length 3 (e.g. 123,456,6.7)
If you have a large data set, then consider splitting your
whole project into two or more files, ensuring that in each
file you keep together those variables that you wish to
cross-tabulate or correlate.
Number of variable/value labels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All modules 300 lines of text
Number of variable/value labels
processed by the TS-ENTRY module 200 lines of text
Page 20
DO's and DON'TS !
===============
DO :
~~
(1) Take great care that your labels file matches up
EXACTLY with your data file. Your two files
should match up as in the example below :
"SEX","Sex of Individual" │ 1,2,1
"SEX","Male" ├──────────┐ 1,1,2
"SEX","Female" │ │ 2,3,2
"YEAR","Year of Course" │ │ 1,2,1
"YEAR", "First Year" │ └───┘ │ │
"YEAR","Second Year" ├─────────────┘ │
"YEAR","Third Year" │ │
"DRIVER","Holds Driving Licence"│ │
"DRIVER","Can drive" ├────────────┘
"DRIVER","Cannot drive" │
(2) Ensure that the type of data that you have is
appropriate for the module that you are using to
analyse the data. The following table should
clarify the position :
┌─────────────────────────────────────────┬───────────────┐
│ TYPES OF DATA │ MODULE │
├─────────────────────────────────────────┼───────────────┤
│ Nominal (Categorical) data : │ │
│ ~~~~~~~~~~~~~~~~~~~~~~~~~~ │ │
│ Integers typically in the range 1-9 │ TS-FREQ1 │
│ used as answers to questions .. │ TS-CROSS │
│ │ TS-STATS │
├─────────────────────────────────────────┼───────────────┤
│ Interval OR Ratio data │ │
│ ~~~~~~~~~~~~~~~~~~~~~~ │ │
│ May be large numbers which may │ TS-STATS │
│ contain a decimal place. An example │ only! │
│ would be a figure for a salary (e.g. │ │
│ 9500) or a height (5.5 feet) │ │
└─────────────────────────────────────────┴───────────────┘
Page 21
(3) Make sure that your initial data file does not
contain blank lines at the beginning or at the end of
the file. Also it is important that the data in each
line should be exactly as shown in (1) above, with no
spaces between the data items, with the data items
separated by a comma(,) and with each line terminated
by a normal carriage return ( i.e. the CR/LF pair of
bytes ) If the package 'locks up' after reading a
datafile, then in all probability the cause will be
found in a datafile which contains some of the errors
mentioned above. Ensure that the labels file also
contains no blank lines and that the number of value
labels is consistent with the data set. In particular
try to ensure a consistent spelling with the variable
labels in upper case.
(4) Take care with specifying your drive and MS-DOS
filenames which should not contain embedded
blanks or unconventional characters.
A typical filename might be : a:myfile.txt
Note : no spaces, filename of eight characters or
less, extension specified.
(5) Expand your knowledge by reading appropriate
statistical texts if necessary.
(6) USE the CTRL-BREAK keys to abort a module should
you find that you have made an irrecoverable error
and you wish to return to the principal menu.
DO NOT :
~~~~~~
(1) Attempt to write to a disk which is full or write-
protected
(2) Use categories outside the range 1-9 ( or 0-8 )
in the modules TS-FREQ1 and TS-CROSS.
Collapse your data if necessary so that you do not
have more than nine categories in either direction
in these two modules.